Additional Parameters for CSV, JSON, and Parquet file formats

When configuring CSV, JSON, or Parquet file formats apart from parameters like field delimiter and parse header, you can specify additional parameters such as ignoreLeadingWhiteSpace or multiLine and other parsing options. These parameters control how the file is read and interpreted during data ingestion or processing. Additional parameters ensure that the data is correctly structured and aligned with your schema before ingestion.

For example, when you use Field Delimiter and select the correct delimiter, it ensures that the data integration tool correctly splits the data into columns during ingestion. When you use Parse Header, it specifies whether the first line of the CSV file should be treated as a header row containing column names. Setting this correctly ensures that header values are not loaded as data.

CSV

Using additional parameters for CSV file format ensures:

Data is parsed correctly.
Schema mismatches or incorrect column alignment is prevented.
Support is provided for data from multiple sources with different formatting conventions.

The following additional parameters are supported for CSV file format.

Parameter	Description
comment	Specifies the character that identifies comment lines in the file. Any line beginning with this character is ignored during processing. Example: Setting # treats all lines starting with # as comments.
date_format	Defines the format of date values in the data files (data loading) or table (data unloading).
emptyValue	Sets the string representation of an empty value.
encoding	Defines the character encoding used to read the file. Use UTF-8 for most datasets, especially those containing special or non-English characters.
escape	Character used to escape special symbols within a field, such as a quote or delimiter. Example: Using \ allows "New \"York\"" to be read correctly.
ignoreLeadingWhiteSpace	When enabled, removes extra white spaces at the beginning of each field value. Helps ensure consistent parsing of fields with irregular spacing.
ignoreTrailingWhiteSpace	When enabled, removes white spaces at the end of each field value. Useful for avoiding mismatches caused by unintended spaces.
multiLine	Allows fields to span multiple lines if enclosed in quotes. Enable this option when your data contains line breaks within text fields.
nullValue	Defines the string that should be interpreted as a null or missing value. Example: Setting null or NA ensures those tokens are treated as empty values.
quote	Character used to enclose values containing delimiters, quotes, or line breaks. Commonly set to ". Example: "New York, USA" is treated as one value.
time_format	Defines the format of time values in the data files (data loading) or table (data unloading).
empty_field_as_null	When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (For example, ,,). When unloading data, this option is used in combination with FIELD_OPTIONALLY_ENCLOSED_BY.
field_delimiter	One or more singlebyte or multibyte characters that separate fields in an input file (data loading) or unloaded file (data unloading).
parse_header	Boolean that specifies whether to use the first row headers in the data files to determine column names.
record_delimiter	One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading).
trim_space	Boolean that specifies whether to remove white space from fields.

JSON

Using additional parameters for JSON file format ensures:

Precise schema discovery during ingestion.
Consistent rule application in profiling, validation, and transformation.
Reduction in errors caused by incorrect file interpretation (e.g., misaligned columns, missing headers).

The following additional parameters are supported for JSON file format.

Parameter	Description
allowSingleQuotes
allowUnquotedFieldNames
date_format	Defines the format of date values in the data files (data loading) or table (data unloading).
time_format	Defines the format of time values in the data files (data loading) or table (data unloading).
multi_line	Boolean that specifies whether multiple lines are allowed.
strip_outer_array	Boolean that instructs the JSON parser to remove outer brackets (i.e. [ ]).
null_if	String used to convert to and from SQL NULL.
encoding	For reading, allows to forcibly set one of standard basic or extended encoding for the JSON files. For writing, specifies encoding (charset) of saved json files. JSON built-in functions ignore this option.
trim_space	Boolean that specifies whether to remove leading and trailing white space from strings.

PARQUET

Using additional parameters for PARQUET file format ensures:

Improved schema and data type handling.
Fine-grained control over data ingestion.
Better compatibility over external systems.

The following additional parameters are supported for PARQUET file format.

Parameter	Description
Binary_as_text	Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data.
Trim_space	Boolean that specifies whether to remove leading and trailing white space from strings.
Use_vectorized_scanner	Boolean that specifies whether to use a vectorized scanner for loading Parquet files.

What's next? Ingesting Data from Amazon S3 into a Snowflake Data Lake